Review of Oxford Temperature Measurements
1850 - 2019
Introduction
There a few simple rules when looking at data.
Unfortunately, there isn't a great deal of data to go on. Surface temperatures have been measured for only a few hundred years to any degree of accuracy. Atmospheric CO2 levels are in an even greater perilous state than temperatures.
I have 2 simple data sets.
Is it possible to generalise from the particular to the universal in terms of AGW? Well it must be. Everywhere on the planet must be getting hotter due to the theory of back radiation from a trace gas, CO2.
I will be using Python with its numerous Scientific and Statistical code libraries to help me. I will present the findings, with a full reference to the techniques and computer code used to arrive at conclusions.
To start with, I will use the data from:
to analyse the basic data.
There is also a nice table from Oxford University's School of Geography and Environment which may come in handy.
robs = pd.read_html('https://www.geog.ox.ac.uk/research/climate/rms/summary.html')
robs = pd.DataFrame(robs[0])
robs.head()
Setup the Python Environment
Import the libraries to be used. These include the packages from scipy and sickit-learn.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import scipy.stats as stats
from sklearn.linear_model import LinearRegression
from sklearn.cluster import KMeans
import seaborn as sns
import folium
from IPython.display import display
from mpl_toolkits.mplot3d import Axes3D
%matplotlib inline
Read the Met Office Data into Pandas
Here is a map showing the approximate location of the weather station given the Met Office co-ordinates. I believe it's actually at the Radcliffe Observatory.
station = (51.7607,-1.2625)
my_map = folium.Map(location=station,zoom_start=17)
#inserting marker
folium.Marker(
station,
popup = 'Oxford Weather Station',
tooltip='Oxford Weather Station'
).add_to(my_map)
my_map
Read the CSV file containing the data.
It is a local copy of the original data with the extraneous headers removed
oweather = pd.read_csv('OxfordWeather2019copy.csv')
#Use a cheeky '\033[1m' to print it in bold!
print('\033[1m'+'Length of Data set is ',len(oweather.Year))
oweather.isnull().count()
The main DataFrame is 'oweather'.A Pandas DataFrame(DF) is similar to a spreadsheet but the data manipulations are easier and replicatable. They are aslo easy to critique and amend.
It's worthwhile checking the data types out to be sure that everything is OK.
oweather.dtypes
I can check out 1 years worth of data using the Pandas 'loc' method.
Use a lamda function to pick out the data for the Year you are interested in
oweather.loc[lambda oweather: oweather['Year'] == 1916]
Read the CO2 data into Pandas
I want to look at the CO2 levels measured at Mauna Lua to have data on the increase in Parts Per Million.
So let's get the data from this source.
df_co2 = pd.read_csv("monaluaco2stripped.txt",delimiter=r"\s+")
df_co2.drop(['unc'],axis=1,inplace=True)
df_co2.rename(columns={'mean':'PPM'},inplace=True)
df_co2.head(2)
This is the graph of CO2 growth we are not used to seeing. A linear growth since 1949. Only the Y axis ends at 10,000. Had it been drawn sccurately to reflect Parts per Million, there would be nothing to see!
fig=plt.figure()
ax1 = fig.add_axes([0.1,0.5,0.8,0.5],ylim=(300,440))
plt.ylabel('PPM')
plt.title('Fig1. CO2 Increase PPM',fontsize=20)
ax2 = fig.add_axes([0.1,0.1,0.8,0.3],ylim=(200,10000))
plt.title('Co2 Increase PPM in Thousands')
ax1.plot(df_co2.year,df_co2.PPM)
plt.ylabel('PPM in Thousands')
ax2.plot(df_co2.year,df_co2.PPM)
plt.xlabel('Year');
It's possible to find out the details of a linear regression for the CO2 data using scipy.stats.
lm = LinearRegression()
X = df_co2[['year']]
Y = df_co2['PPM']
lm.fit(X, Y)
Yhat = lm.predict(X)
C = lm.intercept_
M = lm.coef_
print(f'c = {C:2.5f}')
print(f'm = {M[0]:2.5f}')
The linear equation for CO2 is
Y = 1.56567(X) + (- 2758.94368)
where X is the Year.
I can check the R2 value and some other details.
slope, intercept, r_value, p_value, std_err = stats.linregress(df_co2['year'], df_co2['PPM'])
print(f'slope: {slope:2.4f},\nintercept: {intercept:2.4f},\nRsquared: {r_value**2:2.4f}')
Exploring the Met Office data
I can start by using the Pandas groupby and pivot methods to manipulate the data into a format where Columns are Month, Rows are Year and the values are the maximum Temperature in centigrade (TmaxC) for each Month. Round the mean down to 2 decimal places to make it readable and make it easier to read.
df_View1 = oweather[['Year','Month','TmaxC']]
df_group1 = df_View1.groupby(['Year','Month'],as_index=False).mean().round(2)
df_pivot_group1 = df_group1.pivot(index='Year',columns='Month')
df_pivot_group1.tail()
A Seaborn plot will produce a 'heat' map of the entire data set.
plt.figure(figsize=(12,12))
plt.figtext(.5,.9,'Fig. 2 Heat map of monthly Temperatures centigrade',
fontsize=20, ha='center')
plt.ylabel('Year')
plt.ylabel('Month')
sns.heatmap(data=df_pivot_group1,cmap='Reds');
Pandas makes it easy to produce basic stats foreach month of the data. Use the pandas .T method to flip the table and make it easier to read.
df_pivot_group1.describe().T
Already we can see the summary of the data gives some insights. The count of data goes from 167 to 166 after June because this is where the data finishes. i.e. June 2019. It is interesting that February has the highest StD of all months by quite some margin.
The 'hottest' months are obviously, July and August.
July looks to be the hottest month of the year with the highest mean, highest max and the lowest minimum temperature. For this exercise, I'm going to focus on July.
I will create a new DF to concentrate on July and then create a rolling average of the data to try and reduce noise and see if any pattern emerges.
I have chosen 25 years because other studies I have seen have used 13 months to average out over 1 year or somewhere between 3 and 5 years. Let's see what will happens with a longer period.
rolling_average = 25
df_July = pd.DataFrame(df_pivot_group1.TmaxC[7])
df_July.reset_index(inplace=True)
df_July.rename(columns={7:'July'},inplace=True)
df_July['Rolling Average'] = df_July.iloc[:,1].rolling(window=rolling_average).mean()
df_July.sample(5)
Now that we have the DataFrame df_July, with all data measurements and a rolling average, we can look at some basic stats.
MAX = df_July[['Year','July']].max()
MIN = df_July[['Year','July']].min()
MEAN = df_July[['July']].mean()
MEDIAN = df_July[['July']].median()
MODE = df_July[['July']].mode()
print('\033[1m')
print(f'Maximum July temperature was in Year {MAX[0]:3.0f} and was {MAX[1]} Centigrade')
print(f'Minimum July temperature was in Year {MIN[0]:3.0f} and was {MIN[1]} Centigrade')
print(f'Mean July temperature was {MEAN[0]:3.2f} Centigrade')
print(f'Mode July temperature for {MODE} Centigrade')
print(f'Median July temperature was {MEDIAN[0]} Centigrade')
I want to see if the data is representative of a normal distribution. Pandas allows me to run a Shapiro test to check this via scipy stats package. But first we have to rop off 2019 from the data. The Null values in 2019 throw a wobbly. A value of 0.05 has been used as the test for significance.
Exploring the Met Office data for the Month of July
df_July.drop(df_July.index[166],inplace=True)
stat, p = stats.shapiro(df_July.July)
print('Statistics=%.3f, p=%.3f' % (stat, p))
# interpret
alpha = 0.05
if p > alpha:
print('Sample passes for normality')
else:
print('Sample is not normal')
I can plot the combined data now using Seaborn which draws a linear regression line through the data and shows a scatter plot of Tmaxc and the rolling average.
sns.set(rc={'figure.figsize':(12,6)})
fig, ax = plt.subplots()
plt.figtext(.5,.9,f'Fig. 3 Scatter plot and {rolling_average} year rolling average for July',
fontsize=20, ha='center')
sns.regplot(df_July.Year,df_July.July,ax=ax)
plt.xlabel('Year')
plt.ylabel('Temperature centigrade')
plt.plot(df_July.Year,df_July['Rolling Average']);
The addition of a 25 year rolling average highlights some downward and upwards trens within the data. Bear in mind the scale of the graph which can be misleading. There are 3 points which are higher than any other points in the data and it would be worth while finding out where they are. I will use the pandas .loc method to check for Years with a TmaxC over 26o
df_July.loc[lambda df_July: df_July['July'] > 26]
And the years are 1983, 2006 and 2018. 2019 which was not included was also 27.4o
We can deduce the formula for the line that Seaborn has kindly drawn for us by using the LinearRegression library from ScikitLearn.
lm = LinearRegression()
X = df_July[['Year']]
Y = df_July['July']
lm.fit(X, Y)
Yhat = lm.predict(X)
C_July = lm.intercept_
M_July = lm.coef_[0]
print(f'The equation for the linear regression July is: Y = {M_July:2.5f} * Year + {C_July:2.5f}')
Yhat is the variable storing the predicted values for the data set based on the linear regression model. Let's have a look at how they stack up by extracting a sample of 10 points.
df_July['Yhat']= Yhat
df_July.sample(5)
I can check the R2 value and some other details.
slope, intercept, r_value, p_value, std_err = stats.linregress(df_July['Year'], df_July['July'])
print(f'slope: {slope:2.4f},\nintercept: {intercept:2.4f},\nRsquared: {r_value**2:2.4f}')
The Linear Model has an R2 value of 0.362 which hardly explains any of the data..
fig, ax = plt.subplots()
plt.figtext(.5,.9,f'Fig. 4 Residual plot of Linear Model for July',
fontsize=20, ha='center')
plt.xlabel('Year')
plt.ylabel('Residual')
sns.residplot(x=df_July['Year'],y=df_July['July'], lowess=True, color="g");
Seaborn kindly plots a Lowess line. In this case we would have perhaps missed the bow in the line due to the good scatter of the residual points.
slope, intercept, r_value, p_value, std_err = stats.linregress(df_July.Year[24:],
df_July['Rolling Average'][24:])
print(f'slope: {slope:2.4f},\nintercept: {intercept:2.4f},\nRsquared: {r_value**2:2.4f}')
It would be interesting to see if there are any clusters in the raw data. I used 5 clusters based on the peaks and troughs from the rolling average.
#This grabs the data for the month we want to look at and
#converts it into a numpy array for further analysis
md = pd.DataFrame()
md['Year']=df_July.Year
md['July']=df_July.July
X =np.array(md)
#Use sklearn Kmeans to analyse the data
#Setup the colours for the clusters
colors=['g.','r.','y.','b.','c.']
clf = KMeans(n_clusters=5,n_init=30,max_iter=900)
#THIS DEFINES THE NUMBER OF CLUSTERS TO LOOK FOR ETC.
clf.fit(X)
centroids = clf.cluster_centers_
labels = clf.labels_
plt.figure(figsize=(12,6))
plt.xlabel('Year',fontsize=15)
plt.ylabel('Temperature - Centigrade',fontsize=15)
plt.title('Fig. 5 KMeans Scatter Plot for month July Temperature - Centigrade with Rolling Average',
fontsize=15)
for i in range(len(X)):
plt.plot(X[i][0],
X[i][1],
colors[labels[i]],
markersize=15)
plt.scatter(centroids[:,0],
centroids[:,1],
marker='x',
s=50,
linewidth=3,
color='black')
plt.plot(df_July.Year,df_July['Rolling Average'])
plt.plot(df_July.Year,0.00789 * df_July.Year + 6.588091);
Interestingly, the KMeans algorithm has picked out the clusters which roughly correspond to the Rolling average line.
Exploring the Met Office Average Yearly Data
What if we look at the Annual Mean temperatures over the data set? I need to drop off 2019 as it's not a full year. I need to create some new DF's base on our original pivot table for the original data using a simple .mean method.
df_View1.dropna()
df3 = df_View1.copy()
df3.drop(df3.index[1992:1998],inplace=True)
grouped = df3.groupby('Year')['TmaxC'].mean()
grouped =pd.DataFrame(grouped)
grouped.reset_index(inplace=True)
grouped.sample(5)
For comparison with the analysis for July, we should also look at a rolling average. First create the rolling average and then plot with Seaborn regplot. We still have to work out the linear regression equation as Seaborn doesn't give it.
grouped['rolling_average'] = grouped.iloc[:,1].rolling(window=rolling_average).mean()
fig, ax = plt.subplots()
plt.figtext(.5,.9,f'Fig. 6 Scatter plot and {rolling_average} year rolling average of Yearly Average',
fontsize=18, ha='center')
sns.regplot(grouped.Year,grouped.TmaxC,color='g')
plt.xlabel('Year')
plt.ylabel('Temperature centigrade')
plt.plot(grouped.Year,grouped.rolling_average);
As a matter of interest, it would be nice to know when that horrendously low value was.
grouped.loc[lambda grouped: grouped['TmaxC'] < 11.0]
And here is the data for the year 1879:
oweather.loc[lambda oweather: oweather['Year'] == 1879]
A link to 1879 being a very cold year and a more in depth look here
There is also a nice cluster of high data points between the mid 1980's until 2018.
Three El Nino events calssed as Very Strong were in 1982-83, 1997-98 and 2015-16. It is pure conjecture given my data to say they are related, but it is worth mentioning.
Scipy stats can be used, again, to work out the equation of the linear regression.
lm = LinearRegression()
X = grouped[['Year']]
Y = grouped['TmaxC']
lm.fit(X, Y)
YhatYear = lm.predict(X)
CYear = lm.intercept_
MYear = lm.coef_
print(f'c = {CYear:2.5f}')
print(f'm = {MYear[0]:2.5f}')
So the Linear regression model for Average Annual Tmax is Y = 0.00830(X) + -2.13667 where X is the Year.
Again, Residuals plot should be used to check out the linear regression model.
fig, ax = plt.subplots()
sns.residplot(x=grouped['Year'],y=grouped['TmaxC'], lowess=True, color="g")
plt.figtext(.5,.9,f'Fig. 7 Residual plot of Linear Model annual Mean TmaxC ',
fontsize=20, ha='center')
plt.xlabel('Year')
plt.ylabel('Residual');
The Linear regression Model Equations
Exploring the Met Office Average Yearly Data Rolling Average
Now I want to find out the Linear regression model for the rolling average data. I need to drop the Null or Nan values
df_rolling = grouped.dropna().copy()
lm = LinearRegression()
X = df_rolling[['Year']]
Y = df_rolling['rolling_average']
lm.fit(X, Y)
YhatRA = lm.predict(X)
CYear = lm.intercept_
MYear = lm.coef_
print(f'C Rolling Average = {CYear:2.5f}')
print(f'M Rolling Average = {MYear[0]:2.5f}')
#x = np.array([i for i in range(1850,2019)])
y_year = MYear * X + CYear
The linear equation for rolling average 1850 2018 is:
Y = 0.00689(X) + 0.44234)
plt.plot(X,y_year,'-g',label='Annual Avg Tmax Linear Model Y=0.00689* x + 0.44234')
plt.plot(grouped.Year,grouped.rolling_average, label='25 Year Rolling Avg of mean yearly Tmax')
plt.figtext(.5,.9,f'Fig. 8 {rolling_average} year rolling average of Mean Annual Tmax V Linear Regression Model',
fontsize=20, ha='center')
plt.legend(loc='best');
slope, intercept, r_value, p_value, std_err = stats.linregress(df_rolling['Year'], df_rolling['rolling_average'])
print(f'slope: {slope:2.4f} \nintercept: {intercept:2.4f} \nRsquared Value: {r_value**2:2.4f} \nPvalue: {p_value:2.2f}')
From the R2 value it can be concluded that the Linear Regression model only accounts for some 56.32% of the difference in temperatures.
Let's plot this against the residuals from the model.
fig, ax = plt.subplots()
sns.residplot(df_rolling.Year,df_rolling.rolling_average, lowess=True, color="g",robust=False)
plt.figtext(.5,.9,f'Fig. 9 {rolling_average} year rolling average Avg Annual Tmax Residual',
fontsize=20, ha='center')
plt.xlabel('Year')
plt.ylabel(f'Residual');
Now we see a distinct pattern in the data which suggests that a simple linear regression is not the best model.
Create a new DF for the rolling average values
df_rolling.head()
df_rolling = pd.DataFrame(grouped.rolling_average)
df_rolling.dropna(inplace=True)
df_rolling.reset_index(inplace=True)
df_rolling.rename(columns={'index': 'Year','Rolling Average': 'RA'},inplace=True)
Now let's try and fit a polynomial using numpy polyfit method.
z = np.poly1d(np.polyfit(df_rolling.Year,df_rolling.rolling_average,6))
qw = [z(i) for i in range(24,166)]
plt.title('Fig 10 6th Degree polynomial curve fitted to data',fontsize=20)
plt.ylabel('temperature C')
plt.plot(df_rolling.Year, df_rolling.rolling_average, 'o', label='original data')
plt.plot(df_rolling.Year,qw);
That curve fits the data much better. There are problems with using a polynomial especially at the boundaries. They are useless at preicting past and Future values! This a 6th degree polynomial with the equation below.
print(z)
Let's check the predicted values based om the polynomial curve.
df_rolling['predicted']=qw
df_rolling.sample(10)
Checking the dta from other sources
There are misgivings by people on the reporting of metereological data. Remember our TmaxC for June 2019 was 19.8oC from the Met Office?
Well here is the data from Benson in Oxfordshire for the same month. Some 14 miles South of Oxford.
tables = pd.read_html(
"https://www.timeanddate.com/weather/uk/oxford/historic?month=6&year=2019")
Table1 = tables[0]
Table1
Is 19.8oC an average of the TmaxC temperatures measured for June 2019?
Is 11oC the average of minimum TminC temperatures measured for June 2019?
This will involve a bit of 'Pencil Work'!
First the Max temperatures by day from Benson.
June20196TmaxC = [25,24,19,19,18,19,17,18,19,13,13,14,14,18,18,19,19,18,19,20,20,22,23,26,20,19,24,25,33,23]
print(f'The average June temperature for 2019 was: {sum(June20196TmaxC)/len(June20196TmaxC):2.2f}')
June20196TmaxC[28] = 24
round((sum(June20196TmaxC)/len(June20196TmaxC)),2)
Now that makes sense. The Met Office figures look like an average of the Maximum temperatures per day, averaged out at 19.93oC.
Now for the minimum Temperatures.
These are recorded by the Met Office as 11oC.
It should be noted that 33o is a significant deviation from the other measured temperatures but there is no reason to doubt it.
June20196TminC = [9,11,8,8,8,7,8,8,5,7,16,9,11,9,8,7,11,8,14,9,6,5,11,15,16,13,12,10,12,14]
print(f'The average June Minimum Temperature was {sum(June20196TminC)/len(June20196TminC):2.2f}')
Now that does not make sense.
9.83oC is significantly different from 11oC.
Just to be sure. I will show the average as mentioned in the Benson calculation is:
print(round(sum(June20196TmaxC)/ len(June20196TmaxC) + sum(June20196TminC)
/ len(June20196TminC)) / 2)
And there we have our 15o average temperature from the Benson data.
Benson is a small village (pop. 4,754) out in the country which could explain the lower TminCo
Some thoughts on the results
And there's more!
Although this is a relatively small data sample, it is a great data set to work on.
I want to draw a graph with ranges of Standard Deviations from -4 to + 4 and plot the data points to get a better visual of where on a normal distribution the data points lie. I am fed up of 'anomalies'! First task is to make a copy of the original grouped DF and round the data points down to 2 places. 2 decimal places are fine.
df_yearly_average = grouped.copy(deep=True)
df_yearly_average['TmaxC'] = df_yearly_average.TmaxC.round(2)
df_yearly_average.describe()
Checking the average by taking multiple sample averages
I want to make a quick check on the average by taking sample averages and working out an average based on the samples. This will be different everytime you run the code.
sample_averages = []
for i in range(50):
sample_averages.append(df_yearly_average.TmaxC.sample(30).mean())
round(sum(sample_averages)/len(sample_averages),3)
Yavg = round(df_yearly_average.TmaxC.mean(),2)
Ystd = round(df_yearly_average.TmaxC.std(),2)
print('Average TmaxC is ',Yavg)
print('Standard Deviation of TmaxC is ',Ystd)
I can then make a list of temperature ranges based on standard deviations from the average. EG. Everything falling between -3 and +3 SD's account for 99.7% of all data points.
TmaxCrange = []
for i in range(-4,4,1):
Temp = round(Yavg + Ystd*i,2)
TmaxCrange.append(Temp)
TmaxCrange
Let's find out the minimum and maximum tempertures for TmaxC annual average.
print('The minimum temperture for TmaxC was ',df_yearly_average.TmaxC.min())
print('The maximum temperture for TmaxC was ',df_yearly_average.TmaxC.max())
An ugly function to check the classification.
def classifyTemp(t,rg):
if t < rg[1] and t > rg[0]:
return -4
elif t < rg[2] and t > rg[1]:
return -3
elif t < rg[3] and t > rg[2]:
return -1
elif t < rg[4] and t > rg[3]:
return 1
elif t < rg[5] and t > rg[4]:
return 2
elif t < rg[6] and t > rg[5]:
return 3
elif t < rg[7] and t > rg[6]:
return 4
else:
return 5
Now apply the function to our average temperatures and classify them from 1 to 7. Out of interest to pythonistas, I tried several methods to do this via apply, map etc and eventually used this hack. It extracts the temperature data as a pandas series, applies the function to a list of the values, creating a new seies which is added to the original data frame.
s123 = pd.Series(df_yearly_average.TmaxC)
df_yearly_average['classed'] = pd.Series([classifyTemp(i,TmaxCrange) for i in s123])
df_yearly_average.head()
cmap = sns.light_palette("Green", as_cmap=True)
sns.scatterplot(x=df_yearly_average.Year,
y=df_yearly_average.classed,
size="TmaxC",
sizes=(120,120),
palette=cmap,
data=df_yearly_average,
hue='classed',
legend=False
)
plt.xticks(np.arange(min(df_yearly_average.Year),
max(df_yearly_average.Year)+1, 5.0),
rotation=60)
plt.title('Fig 11 Data points per year above below average for whole data set',fontsize=20)
plt.ylabel('Standard Deviations',fontsize = 18)
plt.xlabel('Year',fontsize = 15)
plt.show()
Indeed, from 1988 there is an increasing trend away from the data set average. That is there are no data points less than 1 std away from the average. To put it into context. The average is 13.93o and the highest temperature was 15.85oand 15.63o is 2 std above average.
Are you sick of seeing 2d plots with a line through them?
Like the plot below of all TmaxC data in the original file?.
ow_x = [i for i in range(len(oweather))]
sns.regplot(ow_x,oweather.TmaxC);
Here is a 3D representation of the same data as Fig. 2. If you uncomment the line '%matplotlib qt', a separate window will open and you can rotate the graph.
oweather.describe
df_temp = pd.DataFrame()
df_temp['X'] = oweather.Year
df_temp['Y'] = oweather.Month
df_temp['Z'] = oweather.TmaxC
#%matplotlib qt
#%matplotlib inline
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(11,9))
ax = fig.gca(projection='3d')
ax.plot_trisurf(df_temp.X, df_temp.Y,df_temp.Z,cmap=plt.cm.rainbow, linewidth=0.2)
fig.tight_layout()
plt.title('Fig. 12 3D plot of TmaxC monthly and yearly',fontsize=20)
plt.show()
There is wonderful resource on using matplotlib here. which is where I pinched most of the code for these graphs. The code below is probably more interesting to Python programmers
#x = df_temp.X #Year
#y = df_temp.Z #TmaxC
data = df_temp[['Y', 'Z']].to_numpy() #This is about the only change I made to the code.
x,y = data.T
fig, axes = plt.subplots(ncols=6, nrows=1, figsize=(35, 10))
# Everything sarts with a Scatterplot
axes[0].set_title('Scatterplot')
axes[0].plot(x, y, 'ko')
# As you can see there is a lot of overplottin here!
# Thus we can cut the plotting window in several hexbins
nbins = 40
axes[1].set_title('Hexbin')
axes[1].hexbin(x, y, gridsize=nbins, cmap=plt.cm.rainbow)
# 2D Histogram
axes[2].set_title('2D Histogram')
axes[2].hist2d(x, y, bins=nbins, cmap=plt.cm.rainbow)
# Evaluate a gaussian kde on a regular grid of nbins x nbins over data extents
k = kde.gaussian_kde(data.T)
xi, yi = np.mgrid[x.min()-2:x.max()+2:nbins*1j, y.min():y.max():nbins*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
# plot a density
axes[3].set_title('Calculate Gaussian KDE')
axes[3].pcolormesh(xi, yi, zi.reshape(xi.shape), cmap=plt.cm.rainbow)
# add shading
axes[4].set_title('2D Density with shading')
axes[4].pcolormesh(xi, yi, zi.reshape(xi.shape), shading='gouraud', cmap=plt.cm.rainbow)
# contour
axes[5].set_title('Contour')
axes[5].pcolormesh(xi, yi, zi.reshape(xi.shape), shading='gouraud', cmap=plt.cm.rainbow)
axes[5].contour(xi, yi, zi.reshape(xi.shape) )
Graph 6 above labelled 'Contour' is interesting in that it would seem to show a hot spot in February - March time and in November.
Well not exactly 'hot'. More a concentration of temperatures at the Y axis level.
plt.figure(figsize = (11,6))
plt.pcolormesh(xi, yi, zi.reshape(xi.shape), shading='gouraud', cmap=plt.cm.rainbow)
plt.contour(xi, yi, zi.reshape(xi.shape) )
plt.title('Countour plot of KDE of all temperature TmaxC measurements',fontsize=16)
plt.xlabel('Month')
plt.ylabel('Temp centigrade')
It's important to know what we are looking at here. This is essentially a 'shadow' of Fig. 12's Kernel Densities in 2D. And these are for TmaxC. it shows the probability of temperatures for each month based on the period 1853 - 2018.
So one thought that occurrs to me is that a sequence of these plots based on a 25 year period ( 7 graphs) could give a better visual clue as to how temperatures have moved over the period of the data set.
#76 data frames check out for the splits
df_temp1 = df_temp[0:300]
df_temp2 = df_temp[300:600]
df_temp3 = df_temp[600:900]
df_temp4 = df_temp[900:1200]
df_temp5 = df_temp[1200:1500]
df_temp6 = df_temp[1500:1800]
df_temp7 = df_temp[1800:]
data = df_temp1[['Y', 'Z']].to_numpy() #This is about the only change I made to the code.
x,y = data.T
k = kde.gaussian_kde(data.T)
#xi, yi = np.mgrid[x.min()-2:x.max()+2:nbins*1j, y.min():y.max():nbins*1j]
xi, yi = np.mgrid[-2:x.max()+2:nbins*1j, -4:30:nbins*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
plt.figure(figsize = (11,6))
plt.pcolormesh(xi, yi, zi.reshape(xi.shape), shading='gouraud', cmap=plt.cm.rainbow)
plt.contour(xi, yi, zi.reshape(xi.shape) )
plt.title('Countour plot of KDE of all temperature TmaxC measurements',fontsize=16)
plt.xlabel('Month')
plt.ylabel('Temp centigrade')
data = df_temp2[['Y', 'Z']].to_numpy() #This is about the only change I made to the code.
x,y = data.T
k = kde.gaussian_kde(data.T)
#xi, yi = np.mgrid[x.min()-2:x.max()+2:nbins*1j, y.min():y.max():nbins*1j]
xi, yi = np.mgrid[-2:x.max()+2:nbins*1j, -4:30:nbins*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
plt.figure(figsize = (11,6))
plt.pcolormesh(xi, yi, zi.reshape(xi.shape), shading='gouraud', cmap=plt.cm.rainbow)
plt.contour(xi, yi, zi.reshape(xi.shape) )
plt.title('Countour plot of KDE of all temperature TmaxC measurements',fontsize=16)
plt.xlabel('Month')
plt.ylabel('Temp centigrade')
data = df_temp3[['Y', 'Z']].to_numpy() #This is about the only change I made to the code.
x,y = data.T
k = kde.gaussian_kde(data.T)
#xi, yi = np.mgrid[x.min()-2:x.max()+2:nbins*1j, y.min():y.max():nbins*1j]
xi, yi = np.mgrid[-2:x.max()+2:nbins*1j, -4:30:nbins*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
plt.figure(figsize = (11,6))
plt.pcolormesh(xi, yi, zi.reshape(xi.shape), shading='gouraud', cmap=plt.cm.rainbow)
plt.contour(xi, yi, zi.reshape(xi.shape) )
plt.title('Countour plot of KDE of all temperature TmaxC measurements',fontsize=16)
plt.xlabel('Month')
plt.ylabel('Temp centigrade')
data = df_temp4[['Y', 'Z']].to_numpy() #This is about the only change I made to the code.
x,y = data.T
k = kde.gaussian_kde(data.T)
#xi, yi = np.mgrid[x.min()-2:x.max()+2:nbins*1j, y.min():y.max():nbins*1j]
xi, yi = np.mgrid[-2:x.max()+2:nbins*1j, -4:30:nbins*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
plt.figure(figsize = (11,6))
plt.pcolormesh(xi, yi, zi.reshape(xi.shape), shading='gouraud', cmap=plt.cm.rainbow)
plt.contour(xi, yi, zi.reshape(xi.shape) )
plt.title('Countour plot of KDE of all temperature TmaxC measurements',fontsize=16)
plt.xlabel('Month')
plt.ylabel('Temp centigrade')
data = df_temp5[['Y', 'Z']].to_numpy() #This is about the only change I made to the code.
x,y = data.T
k = kde.gaussian_kde(data.T)
#xi, yi = np.mgrid[x.min()-2:x.max()+2:nbins*1j, y.min():y.max():nbins*1j]
xi, yi = np.mgrid[-2:x.max()+2:nbins*1j, -4:30:nbins*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
plt.figure(figsize = (11,6))
plt.pcolormesh(xi, yi, zi.reshape(xi.shape), shading='gouraud', cmap=plt.cm.rainbow)
plt.contour(xi, yi, zi.reshape(xi.shape) )
plt.title('Countour plot of KDE of all temperature TmaxC measurements',fontsize=16)
plt.xlabel('Month')
plt.ylabel('Temp centigrade')
data = df_temp6[['Y', 'Z']].to_numpy() #This is about the only change I made to the code.
x,y = data.T
k = kde.gaussian_kde(data.T)
#xi, yi = np.mgrid[x.min()-2:x.max()+2:nbins*1j, y.min():y.max():nbins*1j]
xi, yi = np.mgrid[-2:x.max()+2:nbins*1j, -4:30:nbins*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
plt.figure(figsize = (11,6))
plt.pcolormesh(xi, yi, zi.reshape(xi.shape), shading='gouraud', cmap=plt.cm.rainbow)
plt.contour(xi, yi, zi.reshape(xi.shape) )
plt.title('Countour plot of KDE of all temperature TmaxC measurements',fontsize=16)
plt.xlabel('Month')
plt.ylabel('Temp centigrade')
data = df_temp7[['Y', 'Z']].to_numpy() #This is about the only change I made to the code.
x,y = data.T
k = kde.gaussian_kde(data.T)
#xi, yi = np.mgrid[x.min()-2:x.max()+2:nbins*1j, y.min():y.max():nbins*1j]
xi, yi = np.mgrid[-2:x.max()+2:nbins*1j, -4:30:nbins*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
plt.figure(figsize = (11,6))
plt.pcolormesh(xi, yi, zi.reshape(xi.shape), shading='gouraud', cmap=plt.cm.rainbow)
plt.contour(xi, yi, zi.reshape(xi.shape) )
plt.title('Countour plot of KDE of all temperature TmaxC measurements',fontsize=16)
plt.xlabel('Month')
plt.ylabel('Temp centigrade')